• Balancing Performance Against Cost and Sustainability in Multi-Chip-Module GPUs 

      Zhang, Shiqing; Naderan-Tahan, Mahmood; Jahre, Magnus; Eeckhout, Lieven (Peer reviewed; Journal article, 2023)
      MCM-GPUs scale performance by integrating multiple chiplets within the same package. How to partition the aggregate compute resources across chiplets poses a fundamental trade-off in performance versus cost and sustainability. ...
    • Characterizing Multi-Chip GPU Data Sharing 

      Zhang, Shiqing; Naderan-Tahan, Mahmood; Jahre, Magnus; Eeckhout, Lieven (Journal article; Peer reviewed, 2023)
      Multi-chip Graphics Processing Unit (GPU) systems are critical to scale performance beyond a single GPU chip for a wide variety of important emerging applications. A key challenge for multi-chip GPUs, though, is how to ...
    • Delegated Replies: Alleviating Network Clogging in Heterogeneous Architectures 

      Zhao, Xia; Eeckhout, Lieven; Jahre, Magnus (Peer reviewed; Journal article, 2022)
      Heterogeneous architectures with latency-sensitive CPU cores and bandwidth-intensive accelerators are attractive as they deliver high performance at favorable cost. These architectures typically have significantly more ...
    • GDP: Using Dataflow Properties to Accurately Estimate Interference-Free Performance at Runtime 

      Jahre, Magnus; Eeckhout, Lieven (Journal article; Peer reviewed, 2018)
      Multi-core memory systems commonly share resources between processors. Resource sharing improves utilization at the cost of increased inter-application interference which may lead to priority inversion, missed deadlines ...
    • Get Out of the Valley: Power-Efficient Address Mapping for GPUs 

      Yuxi, Liu; Zhao, Xia; Jahre, Magnus; Wang, Zhenlin; Wang, Xiaolin; Lou, Yingwei; Eeckhout, Lieven (Journal article; Peer reviewed, 2018)
      GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary to support 100s to 1000s of concurrent threads. On the software side, GPU-compute workloads also use multi-dimensional ...
    • HSM: A Hybrid Slowdown Model for Multitasking GPUs 

      Zhao, Xia; Jahre, Magnus; Eeckhout, Lieven (Chapter, 2020)
      Graphics Processing Units (GPUs) are increasingly widely used in the cloud to accelerate compute-heavy tasks. However, GPU-compute applications stress the GPU architecture in different ways --- leading to suboptimal resource ...
    • MDM: The GPU Memory Divergence Model 

      Wang, Lu; Jahre, Magnus; Adileh, Almutaz; Eeckhout, Lieven (Chapter, 2020)
      Analytical models enable architects to carry out early-stage design space exploration several orders of magnitude faster than cycle-accurate simulation by capturing first-order performance phenomena with a set of mathematical ...
    • Modeling Emerging Memory-Divergent GPU Applications 

      Wang, Lu; Jahre, Magnus; Adileh, Almutaz; Wang, Zhiying; Eeckhout, Lieven (Journal article; Peer reviewed, 2019)
      Analytical performance models yield valuable architectural insight without incurring the excessive runtime overheads of simulation. In this work, we study contemporary GPU applications and find that the key performance-related ...
    • NUBA: Non-Uniform Bandwidth GPUs 

      Zhao, Xia; Jahre, Magnus; Tang, Yuhua; Zhang, Guangda; Eeckhout, Lieven (Chapter, 2023)
      The parallel execution model of GPUs enables scaling to hundreds of thousands of threads, which is a key capability that many modern high-performance applications exploit. GPU vendors are hence increasing the compute and ...
    • SAC: Sharing-Aware Caching in Multi-Chip GPUs 

      Zhang, Shiqing; Naderan-Tahan, Mahmood; Jahre, Magnus; Eeckhout, Lieven (Chapter, 2023)
      Bandwidth non-uniformity in multi-chip GPUs poses a major design challenge for its last-level cache (LLC) architecture. Whereas a memory-side LLC caches data from the local memory partition while being accessible by all ...
    • Selective Replication in Memory-Side GPU Caches 

      Zhao, Xia; Jahre, Magnus; Eeckhout, Lieven (Chapter, 2020)
      Data-intensive applications put immense strain on the memory systems of Graphics Processing Units (GPUs). To cater to this need, GPU memory systems distribute requests across independent units to provide high bandwidth by ...
    • TEA: Time-Proportional Event Analysis 

      Gottschall, Björn; Eeckhout, Lieven; Jahre, Magnus (Chapter, 2023)
      As computer architectures become increasingly complex and heterogeneous, it becomes progressively more difficult to write applications that make good use of hardware resources. Performance analysis tools are hence critically ...